The Benefit of Syntactic vs. Linear N-grams for Linguistic Description
نویسندگان
چکیده
Automatic dependency annotations have been used in all kinds of language applications. However, there has been much less exploitation of dependency annotations for the linguistic description of language varieties. This paper presents an attempt to employ dependency annotations for describing style. We argue that for this purpose, linear n-grams (that follow the text’s surface) alone do not appropriately represent a language like German. For this claim, we present theoretically as well as empirically founded arguments. We suggest syntactic n-grams (that follow the dependency paths) as a possible solution. To demonstrate their potential, we compare the German academic languages of linguistics and literary studies using both linear and syntactic n-grams. The results show that the approach using syntactic n-grams allows for the detection of linguistically meaningful patterns that do not emerge in a linear n-gram analysis, e. g. complex verbs and light verb constructions.
منابع مشابه
Syntactic Structures and Rhetorical Functions of Electrical Engineering, Psychiatry, and Linguistics Research Article Titles in English and Persian: A Cross-linguistic and Cross-disciplinary Study
A research article (RA) title is the first and foremost feature that attracts the reader's attention, the feature from which she/he may decide whether the whole article is worth reading. The present study attempted to investigate syntactic structures and rhetorical functions of RA titles written in English and Persian and published in journals in three disciplines of Electrical Engineering, Psy...
متن کاملN-gramas sintácticos no-continuos
In this paper, we present the concept of noncontinuous syntactic n-grams. In our previous works we introduced the general concept of syntactic n-grams, i.e., n-grams that are constructed by following paths in syntactic trees. Their great advantage is that they allow introducing of the merely linguistic (syntactic) information into machine learning methods. Certain disadvantage is that previous ...
متن کاملThe Predictive Power of Syntactic Knowledge, Vocabulary Breadth and Metacognitive Strategies for L2 Reading Fluency
Fluent reading is a multifaceted ability that integrates several linguistic and non-linguistic processes. Accordingly, recognizing the critical components of fluent reading is highly significant in planning and implementing effective reading programs. This study was undertaken to evaluate the predictive power of syntactic knowledge, vocabulary breadth, and metacognitive awareness of reading str...
متن کاملDependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition
Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this prob lem, several lexical, syntactic and semantic based tech niques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntac tic dependency and...
متن کاملDeveloping a hybrid NP parser
We describe the use of energy function optimisation in very shallow syntactic parsing. The approach can use linguistic rules and corpus-based statistics, so the strengths of both linguistic and statistical approaches to NLP can be combined in a single framework. The rules are contextual constraints for resolving syntactic ambiguities expressed as alternative tags, and the statistical language m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017